Transcoding compressed videos to reducing resolution videos

ABSTRACT

A method and system transcodes an input video to a lower spatial resolution. The input video is first decoded into pictures. Each picture includes a set of macroblocks. Each picture sub-sampled to a downscaled picture having a lower spatial resolution. A quantization scale is selected for each macroblock in the downscaled picture. A set of motion vectors is generated for each macroblock in the downscaled picture. A multiplier value based on the quantization scale is determined for each macroblock in the downscaled picture. One of a plurality of encoding modes is selected for each macroblock in the downscaled picture according to the quantization scale, the motion vectors, and the multiplier value. Then, each macroblock in each downscaled picture is encoded according to the quantization scale, the selected encoding mode, and the set of motion vectors to produce an output video having a lower spatial resolution that the input video.

FIELD OF THE INVENTION

[0001] The invention relates generally to video transcoding, and moreparticularly to transcoding compressed videos from a higher spatialresolution to a lower spatial resolution.

BACKGROUND OF THE INVENTION

[0002] Video transcoding converts video bit streams from one codingformat to other formats. The transcoding can consider syntax, bit rate,and resolution conversions. Transcoders can be used at the source ordestination of videos, or in between, e.g., in video servers, networkrouters, and video receivers. Transcoders enable the delivery of videosto a variety of devices having different network connections or displaycapabilities, see U.S. Pat. No. 6,483,851, “System for networktranscoding of multimedia data flow,” issued to Neogi on Nov. 19, 2002,U.S. Pat. No. 6,490,320, “Adaptable bitstream video delivery system,”issued to Vetro, et al. on Dec. 3, 2002, and U.S. Pat. No. 6,345,279,“Methods and apparatus for adapting multimedia content for clientdevices,” issued to Li, et al. on Feb. 5, 2002.

[0003] The above patents focus on higher-level system design issues.However, detailed information describing the transcoding of video is notprovided. In particular, those patents do not disclose how quantizationparameters and conversion modes for macroblocks are determined.

[0004] Recently, there is an increased demand for video transcoding withspatial resolution reduction. Such requirements come fromhigh-definition TV (HDTV) broadcasting and DVD applications, etc. Inorder to display HDTV programs on standard definition TV (SDTV), or torecord the HDTV on the DVD recorder, it is necessary to convert a highresolution HDTV video to a low resolution SDTV video. In addition,hand-held devices with small video displays and low bit rate wirelessconnections require video transcoding.

[0005] The reduction of spatial resolution has been described by Xin, etal., “An HDTV-to-SDTV spatial transcoder,” IEEE Transactions on Circuitsand Systems for Video Technology, Vol. 12, No. 11, November 2002, Yin,et al., “Drift compensation for reduced spatial resolution transcoding,”IEEE Transactions on Circuits and Systems for Video Technology, Vol. 12,No. 11, November 2002, Shanableh, et al., “Heterogeneous videotranscoding to lower spatio-temporal resolutions and different encodingformats,” IEEE Transactions on Multimedia, Vol. 2, No. 2, June 2000, andShen, et al., “Transcoder with arbitrarily resizing capability,” IEEEproc. ISCAS 2001.

[0006]FIG. 1 shows the basic structure and operation of a typical priorart video transcoder 100. The transcoder 100 includes a decoder 110, adownscale filter 120, and an encoder 130 connected serially to eachother module. A macroblock mapper 140 is connected between the decoderand the encoder. An input video bitstream 101, with bit rate R1, isdecoded 110 into YUV video frames. The decoded frames are then spatiallydownscaled 120 to lower resolution YUV frames. Concurrently, motionvectors and coding modes are extracted from the input bitstream by theMB mapper 140. The encoder 130 uses the extracted macroblock informationto encode the filtered YUV frames into an output video stream 102 with alower bit rate R2 and lower spatial resolution.

[0007] At the macroblock level, a variety of modes can be used to encodea video, depending on the coding standard. For example, in order tosupport interlaced video sequences, the MPEG-2 standard has severaldifferent macroblock coding modes, including intra mode, no motioncompensation (MC) mode, frame/field motion compensation inter mode,forward/backward/interpolate inter mode, and frame/field DCT mode. As anadvantage, the multiple modes provide better coding efficiencies due totheir inherent adaptability.

[0008] However, the prior art either focuses on motion vectorre-sampling or motion re-estimation for spatial resolution reduction,without considering the best coding mode. For efficiency, the encodingmodes for the output video stream are usually based on the coding modesfor the input video stream, using majority-voting. The resulting modesare certainly sub-optimal. Other criteria for making mode decision havealso been described, but those coding modes are limited to intra andinter decision, with similar disadvantages.

[0009] Systems and methods for optimally selecting a macroblock codingmode based on a quantization scale selected for the macroblock aredescribed in U.S. Pat. No. 6,037,987, “Apparatus and method forselecting a rate and distortion based coding mode for a coding system,”issued to Sethuraman on Mar. 14, 2000, U.S. Pat. No. 6,192,081,“Apparatus and method for selecting a coding mode in a block-basedcoding system,” issued to Chiang, et al. on Feb. 20, 2001, and Sun, etal., “MPEG coding performance improvement by jointly optimizing codingmode decisions and rate control,” IEEE Transactions on Circuits andSystems for Video Technology, Vol. 7, No. 3, June 1997.

[0010]FIG. 2 shows a typical prior art system and method 200 for jointlyoptimizing the coding mode and the quantizer. That system 200 basicallyuses a brute force, trial-and-error method. The system 200 includes aquantization selector 210, a mode selector 220, a MB predictor 230, adiscrete cosine transform (DCT) 240, a quantizer 250, a variable lengthcoder (VLC) 260, a cost function 270 to select an optimal quantizationand mode 280. The optimal quantization and mode 280 are achieved by aniterative procedure for searching through a trellis to find a path thathas a lowest cost. As the quantizer selector 210 changes its step size,e.g., 1 to 31, the mode selector 220 responds by selecting each mode foreach macroblock, e.g., intra 221, no MC 222, MC frame 223, and MC field224.

[0011] A macroblock level is predicted 230 in terms of a decoded picturetype. Then, the forward DCT 240 is applied to each macroblock of apredictive residual signal to produce DCT coefficients. The DCTcoefficients are quantized 250 with each step size in the quantizationparameter set. The quantized DCT coefficients are entropy encoded usingthe VLC 260, and a bit rate 261 is recorded for later use. In parallel,a distortion calculation by means of mean-square-error (MSE) isperformed over pixels in the macroblock resulting in a distortion value.

[0012] Next, the resulting bit rate 261 and distortion 251 are receivedinto the rate-distortion module for cost evaluation 270. Therate-distortion function is constrained by a target frame budget imposedby a rate constraint R_(picture) 271. The cost evaluation 270 isperformed on each value q in the quantization parameter set. Thequantization scale and coding mode for each macroblock with the lowestvalue are selected.

[0013] In the prior art system, if Q denotes the set of all admissiblequantizers, and M denotes the set of all admissible coding modes, thenthe complexity of the prior art system is Q×M. Because a single loop foreach quantizer value involves DCT transformation, quantization,distortion and bit count calculation for each macroblock, the doubleloop for joint mode decision and quantizer selection in the prior artmakes the complexity extremely high.

[0014] Given the above prior art, there is a need to provide a newsystem and method for video transcoding with spatial resolutionreduction, which achieves the optimal solution for coding mode decisionand motion vector selection with less complexity.

SUMMARY OF THE INVENTION

[0015] A method and system transcodes an input video to a lower spatialresolution. The input video is first decoded into pictures. Each pictureincludes a set of macroblocks.

[0016] Each picture is sub-sampled to a downscaled picture having alower spatial resolution. A quantization scale is selected for eachmacroblock in the downscaled picture.

[0017] A set of motion vectors is generated for each macroblock in thedownscaled picture. A multiplier value based on the quantization scaleis determined for each macroblock in the downscaled picture.

[0018] One of a plurality of encoding modes is selected for eachmacroblock in the downscaled picture according to the quantizationscale, the motion vectors, and the multiplier value.

[0019] Then, each macroblock in each downscaled picture is encodedaccording to the quantization scale, the selected encoding mode, and theset of motion vectors to produce an output video having a lower spatialresolution than the input video.

BRIEF DESCRIPTION OF THE DRAWINGS

[0020]FIG. 1 is a block diagram of a prior art transcoding system;

[0021]FIG. 2 is a flow chart of a prior art method for joint mode andquantizer selection based on rate and distortion values;

[0022]FIG. 3 is a block diagram of video transcoding system withquantization selection, coding mode optimization and motion vectorformation according to the invention;

[0023]FIGS. 4A-4C are flow charts of methods for processing motionvectors; and

[0024]FIG. 5 is a block diagram of a process for determining amultiplier λ used for optimal mode selection according to the invention.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

[0025] As shown in FIG. 3, the present invention provides a system andmethod 300 for transcoding compressed video streams from a high bit rateand a high resolution to a low bit rate and a low resolution. Incontrast with prior art, the system and method according to theinvention determines a quantization scale, a motion vector decision, anda mode decision in a cascaded manner to provide an optimal transcodingwith lower complexity.

[0026] System Structure

[0027] The system 300 includes a video decoder 310, a downscale filter320, and a video encoder 330 connected serially. The video decoder andthe downscale filter provide input to a quantizer selector 340 and amotion vector (MV) processor 350. Mode selection 360 is based ondetermining 370 a multiplier value λ 371. In addition, the systemincludes a quantizer 380, motion vectors (MVs) 385, and modes 390 forthe encoder 330. The letters A-E refer to input and output signals usedby the MV processor 350, where A are input motion vectors, B aredownscaled pictures, C are intermediate motion vectors, D are macroblockmodes, and E are output motion vectors.

[0028] System Operation

[0029] An input compressed video stream 301 is received into the videodecoder 310 for bitstream syntax decoding. The input video stream can bea progressive or interlaced video. With progressive video, each frame inthe video sequence is raster scanned. Interlaced video has two fieldsper frame, which are referred to as the odd field and the even field.The odd field is scanned before the even field.

[0030] The decoding produces reconstructed pictures 311. A picture isdefined to be a set of macroblocks. Depending on the input signal, themacroblocks can be a group of pixels in a frame or field. To be morespecific, we can refer to a picture as a frame-picture or field-picture.

[0031] The reconstructed pictures 311 are represented in a Y, U, Vformat. The decoder also produces macroblock information 312. Themacroblock information includes quantizer step sizes, macroblock codingmodes and input motion vectors A.

[0032] Each YUV picture is downscaled 320, using sub-sampling, to adownscaled YUV picture 321 to meet a reduced spatial resolutionrequirement.

[0033] The macroblock information 311 from the video decoder 310 and thedownscaled YUV pictures 312 are received into the quantizer selector 340and the MV processor 350. The downscaled pictures are then encoded 330into the output compressed video stream 302 according to the quantizerQ, MVs 385, and a selected mode M.

[0034] Because the quantizer selection 340 and mode selection 360 areperformed in separate modules, the coding mode 390 can be determinedafter the quantizer 380 has been selected. In other words, the modeselection 360 is only for a single quantizer, and not all possiblequantizers, as in the prior art. As before, if we use Q to denote theset of all admissible quantizers, M to denote the set of all admissiblecoding modes, then the complexity of the system according to theinvention is only Q+M, rather than Q×M as in the prior art. If Q>1 andM>1, then Q+M≦Q×M. As the values Q and M increase, the complexity of thesystem according to the invention increases at a much lower rate thanthe complexity of the prior art system.

[0035] Furthermore, the system 300 has greater flexibility than theprior art system. The quantizer selection 340 can be achieved by anymeans, therefore the quantizer selector can be replaced by anothersimilar module without affecting the overall operation of the system. Inaddition to quantizer selection and mode selection, variousconfigurations of motion vector processing 350 are possible to greatlyenhance the flexibilities of the video transcoding system and method 300according to the invention.

[0036] Quantizer Selection

[0037] Quantizer selection 340 can be achieved by any known means. Forexample, the well-known TM5 quantizer selection can be used, or anyother optimal quantizer selection process can be used. The main point isthat the quantizer selection process can be made separable from the modedecision to lower the complexity, while achieving a high quality.

[0038] Given a quantization parameter set q_(i)∈{1, . . . , 31}, ∀i=1, .. . N, where N is a macroblock number of each picture, a minimumdistortion D is subject to a bit rate constraint R341,

minD subject to R<R_(picture),   (1)

[0039] with the total distortion D and the total number of bits R givenby $\begin{matrix}{\begin{matrix}{{D = {\sum\limits_{i = 1}^{N}{d_{i}\left( q_{i} \right)}}}\quad} & {R =}\end{matrix}{\sum\limits_{i = 1}^{N}{{r_{i}\left( q_{i} \right)}.}}} & (2)\end{matrix}$

[0040] For a particular value χ, if a set of q₁* (χ) minimizes thefollowing expression: $\begin{matrix}{{{\min\limits_{q_{i}}{\left\{ {{d_{i}\left( q_{i} \right)} + {\chi \quad {r_{i}\left( q_{i} \right)}}} \right\} {\forall i}}} = 1},\quad \ldots \quad,N,} & (3)\end{matrix}$

[0041] then the set of q_(i)* (χ) corresponds to an optimal solution toequation (1).

[0042] To determine an optimal operating point on the R-D curve, anoptimal slope, χ*, is searched in equation (3), such that,R(χ*)<R_(picture). The invention uses a fast convex search process.

[0043] Step-1:

[0044] Initialize two values χ₁ and χ₂ of χ, with χ₁<χ₂ satisfying arelation:${\sum\limits_{i = 1}^{N}{R_{i}\left( \chi_{1} \right)}} < R_{picture} < {\sum\limits_{i = 1}^{N}{{R_{i}\left( \chi_{2} \right)}.}}$

[0045] Step-2: $\chi_{next} = {\frac{\chi_{1} + \chi_{2}}{2}.}$

[0046] Step-3:

[0047] Substitute χ₁ and χ_(next) into Equation (3), minimize theexpression and derive q_(i)* (χ₁) and q_(i)* (χ_(next)), ∀i=1, . . . N,respectively.

[0048] Step-4:

[0049] If [R(χ₁)−R_(picture)][R(χ_(next))−R_(picture)]<0, then replaceχ_(2 by χ) _(next), otherwise, replace χ₁ by χ_(next).

[0050] Step-5:

[0051] If${{\frac{{R\left( \chi_{next} \right)} - R_{picture}}{R_{picture}}} < ɛ},$

[0052] where ε is a predetermined small positive number, then theoptimal slope is χ*, and q_(i)* ∀i=1, . . . N is the optimal quantizerstep size for each macroblock; else, go to Step-2.

[0053] Motion Vector Processing

[0054] As shown in FIGS. 4A-4C, the motion vector (MV) processor 350according to the present invention can have three configurationsincluding MV mapping blocks 410 and MV refinement blocks 420. Theconfigurations differ in the required computational complexity andquality is achieved. In the following, we refer to a set of MVs, wherethe set can include any number of MV's associated with a macroblock,e.g., 1 MV, 4 MV's, etc.

[0055] In FIG. 4A, the MV processor 350 receives only the input MVs Afrom the decoder. The MV mapping 410 is performed, and the resulting setof intermediate or output MVs C/E is recorded. The MVs are used by themode selection module, and is also sent to the encoder 330. The MVmapping 410 can be done in various ways using any prior art method. Thisis the lowest complexity configuration compared to the configurationsdescribed below.

[0056] In FIG. 4B, the MV mapping 410 is first performed based on thereceived motion information as above. The resulting set of intermediateMVs C is output to the mode selection module. The mode selection modulethen calculates each mode based on the resulting set of MVs and sendsthe selected mode back to the MV processing module to refine 420 the setof MVs for the selected mode. To refine the set of MVs, the downscaledpictures B are used. The final set of MVs is then sent to the encoder330.

[0057] In FIG. 4C, both the input motion vectors A and downscaledpictures B are received into the MV processing module. As before, the MVmapping 410 is first performed. Directly following the mapping, the MVrefinement 420, with a small search window around the resulting motionvector, is performed. As for FIG. 4B, to refine the set of MVs, thedownscaled pictures are used. The refined and final sets of MVs C/E areused for the mode selection module and are also sent to the encoder.

[0058] The key difference between the configurations of FIGS. 4B and 4Cis that the selected mode is known in the first case when the MVrefinement is done. Therefore some savings on computation is achievedbecause the MVs associated with the modes that have not selected do notneed to be estimated or refined.

[0059] Optimal Mode Selection

[0060] The resulting quantization scale (Q) and motion vector (MV) foreach macroblock are received into the optimal mode selection module 360.Based on the optimized quantization scales, a Lagrangian rate-distortionprocess selects the coding mode M for each macroblock according to acost function: $\begin{matrix}{{J_{i}\left( {\lambda,\left. M_{k} \middle| q_{i} \right.} \right)} = {\min\limits_{M_{k}}{\left\{ {{D_{i}\left( M_{k} \middle| q_{i} \right)} + {\lambda \quad {R_{i}\left( M_{k} \middle| q_{i} \right)}}} \right\}.}}} & (4)\end{matrix}$

[0061] A multiplier λ for the Lagrangian rate distortion function R(,)is obtained by setting its derivative to zero, i.e., $\begin{matrix}{{\frac{\partial J}{\partial R} = {{\frac{\partial D}{\partial R} + \lambda} = 0}},} & (5)\end{matrix}$

[0062] which yields $\begin{matrix}{\lambda = {- {\frac{\partial D}{\partial R}.}}} & (6)\end{matrix}$

[0063] As shown in FIG. 5, the value for the multiplier λ can beobtained by the following approximation: $\begin{matrix}{{\lambda = {{{- \frac{\partial D}{\partial R}} \approx {- \frac{\Delta \quad D}{\Delta \quad R}}} = {- \frac{{D(q)} - {D\left( {q - \delta} \right)}}{{R\left( {q - \delta} \right)} - {R(q)}}}}},} & (7)\end{matrix}$

[0064] because the quantizer q_(i) and motion vector MV are known foreach macroblock. The process uses a differential distortion ΔD block510, and a differential rate ΔR block 520.

[0065] For each candidate mode, the cost function (4 ) is evaluated, andthe resulting multiplier λ that minimizes the cost is used to select thetranscoding mode for the macroblock. Because the multiplier λ isobtained without iteration, the complexity of finding the optimal codingmode is greatly reduced.

[0066] After determining the quantization scale, the optimal coding modeand the motion vector for the macroblock, the encoder 330 codes thequantized macroblock with the optimal quantization scale, the selectedencoding mode M 390 and the motion vectors 385 to generate thetranscoded bit steam 302.

[0067] Although the invention has been described by way of examples ofpreferred embodiments, it is to be understood that various otheradaptations and modifications can be made within the spirit and scope ofthe invention. Therefore, it is the object of the appended claims tocover all such variations and modifications as come within the truespirit and scope of the invention.

We claim:
 1. A method for transcoding an input video to a lower spatialresolution, comprising the steps of: decoding the input video intopictures, each picture including a set of macroblocks; filtering eachpicture to a downscaled picture having a lower spatial resolution;selecting a quantization scale for each macroblock in the downscaledpicture; generating a set of motion vectors for each macroblock in thedownscaled picture; determining a multiplier value based on thequantization scale for each macroblock in the downscaled picture;selecting one of a plurality of encoding modes for each macroblock inthe downscaled picture according to the quantization scale, the motionvectors, and the multiplier value; and encoding the each macroblock ineach downscaled picture according to the quantization scale, theselected encoding mode, and the set of motion vectors to produce anoutput video having a lower spatial resolution that the input video. 2.The method of claim 1 wherein the video is a progressive video, and thepicture is a frame-picture.
 3. The method of claim 1 wherein the videois an interlaced video, and the picture is a field-picture.
 4. Themethod of claim 1 wherein the decoding produces quantizer step sizes forselecting the quantization scale, the encoding mode, and motion vectors.5. The method of claim 1 wherein each macroblock of each pictureincludes input motion vectors, and the generating further comprises:mapping the input motion vectors to produce the motion vectors for theselecting and encoding.
 6. The method of claim 1 wherein each macroblockof each picture includes input motion vectors, and the generatingfurther comprises: mapping the input motion vectors to intermediatemotion vectors for the selecting; refining the intermediate motionvectors based on the downscaled picture and the selected mode to producethe motion vectors for the encoding.
 7. The method of claim 1 whereineach macroblock of each picture includes input motion vectors, and thegenerating further comprises: mapping the input motion vectors tointermediate motion vectors; refining the intermediate motion vectorsbased on the downscaled picture to produce the motion vectors for theselecting and encoding.
 8. The method of claim 1 further comprising:minimizing a rate-distortion curve to determine the quantization scale.9. The method of claim 1 further comprising: evaluating a cost functionto select the encoding mode for each macroblock in each downscaledpicture.
 10. The method of claim 9 wherein the cost function is${{J_{i}\left( {\lambda,\left. M_{k} \middle| q_{i} \right.} \right)} = {\min\limits_{M_{k}}\left\{ {{D_{i}\left( M_{k} \middle| q_{i} \right)} + {\lambda \quad R_{i}\left( M_{k} \middle| q_{i} \right)}} \right\}}},$

where λ is a multiplier, M is the selected mode, q is the quantizerscale, D is a distortion, and R is a bit rate.
 11. The method of claim10 wherein the multiplier λ for the rate function R(.) is obtained bysetting a derivative of the rate function to zero according to${\frac{\partial J}{\partial R} = {{\frac{\partial D}{\partial R} + \lambda} = 0}},{{{to}\quad {obtain}\quad \lambda} = {- {\frac{\partial D}{\partial R}.}}}$


12. The method of claim 11 wherein the derivative is approximated by$\lambda = {{{- \frac{\partial D}{\partial R}} \approx {- \frac{\Delta \quad D}{\Delta \quad R}}} = {\frac{{D(q)} - {D\left( {q - \delta} \right)}}{{R\left( {q - \delta} \right)} - {R(q)}}.}}$


13. A system for transcoding an input video to a lower spatialresolution, comprising the steps of: means for decoding the input videointo pictures, each picture includes a set of macroblocks; means forfiltering each picture to a downscaled picture having a lower spatialresolution; means for selecting a quantization scale for each macroblockin the downscaled picture; means for generating a set of motion vectorsfor each macroblock in the downscaled picture; means for determining amultiplier value based on the quantization scale for each macroblock inthe downscaled picture; means for selecting one of a plurality ofencoding modes for each macroblock in the downscaled picture accordingto the quantization scale, the motion vectors, and the multiplier value;and means for encoding the each macroblock in each downscaled pictureaccording to the quantization scale, the selected encoding mode, and theset of motion vectors.